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ABSTRACT 

We present a galaxy group catalogue spanning the redshift range 0.1 < z < 1 in the ~ 1.7 deg^ 
COSMOS field, based on the first - 10, 000 zCOSMOS spectra. The performance of both the Friends- 
of-Friends (FOE) and Voronoi-Delaunay-Method (VDM) approaches to group identification has been 
extensively explored and compared using realistic mock catalogues. We find that the performance 
improves substantially if groups are found by progressively optimizing the group-finding parameters 
for successively smaller groups, and that the highest fidelity catalogue, in terms of completeness and 
purity, is obtained by combining the independently created FOE and VDM catalogues. The final 
completeness and purity of this catalogue, both in terms of the groups and of individual members, 
compares favorably with recent results in the literature. The current group catalogue contains 102 
groups with N > 5 spectroscopically confirmed members, with a further ^ 700 groups with 2 < < 4. 
Most of the groups can be assigned a velocity dispersion and a dark-matter mass derived from the 
mock catalogues, with quantifiable uncertainties. The fraction of zCOSMOS galaxies in groups is 
about 25% at low redshift and decreases toward ~ 15% at z ~ 0.8. The zCOSMOS group catalogue 
is broadly consistent with that expected from the semi-analytic evolution model underlying the mock 
catalogues. Not least, we show that the number density of groups with a given intrinsic richness 
increases from redshift z ~ 0.8 to the present, consistent with the hierarchical growth of structure. 
Subject headings: catalogs — galaxies: clusters: general — galaxies: high-redshift — methods: data 
analysis 
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1. INTRODUCTION 

Groups and clusters of galaxies are the most massive 
virialized structures in the Universe. They are impor- 
tant for several reasons. First, groups and clusters define 
the environment in which most galaxies actually reside 
and in which we may expect many important processes 
determining the evolution of galaxies (e.g., Voit 2005). 
Studying the properties of galaxies in groups at different 
redshifts is a direct probe of how the local environment 
affects the formation and evolution of galaxies with cos- 
mic time. Second, characterization of galaxies in groups 
provides information about the galactic content of dark 
matter (DM) halos. This yields statistical quantities such 
as the halo occupation distribution (e.g., Collister & La- 
hav 2005) or the conditional luminosity function (e.g., 
Yang et al. 2008) which themselves yield useful con- 
straints on various physical processes that govern the 
formation and evolution of galaxies. Finally, the num- 
ber density and clustering of groups strongly depend on 
cosmological parameters and thus are a potentially sen- 
sitive probe of the underlying cosmological model (e.g., 
Bahcall et al. 2003; Gladders et al. 2007; Rozo et al. 
2009). 

1^ Max Planck Institute of Astrophysics, Karl-Schwarzschild 
Str. 1, PO Box 1317, D-85748 Garching, Germany 

■^^ Space Telescope Science Institute, 3700 San Martin Drive, 
Baltimore, MD 21218 
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From an observational point of view, there arc many 
ways to identify a group^^. In the current ACDM- 
framework it is natural to associate groups with DM ha- 
los, and this is the definition adopted by most authors. 
Therefore, throughout this paper we refer to a "group" 
as a set of galaxies occupying the same DM halo . 

There are many different observational techniques to 
identify groups in the local and distant Universe in use 
today. Groups can be detected in the optical/near- 
infrared (NIR) (e.g., Gal 2006), by diffuse X-ray emis- 
sion (e.g., Pierre et al. 2006; Finoguenov et al. 2007), 
by the Sunyaev-Zel'dovich effect in the cosmic microwave 
backgroimd (e.g., Carlstrom et al. 2002; Voit 2005), by 
particular wide-angle tailed (WAT) galaxies (e.g., Blan- 
ton et al. 2003), and through cosmic shear due to weak 
gravitational lensing (e.g., Feroz et al. 2008). Each of 
these methods has its own advantages and problems (see 
e.g. Voit 2005; Johnston et al. 2007 § 1), and the choice 
of a particular method might depend on the desired ap- 
phcation. 

If one aims to study the galaxy population in groups, 
searching for groups directly in large optical galaxy sur- 
veys is relatively straightforward and efficient. There 
are many different methods discussed in the literature 
to identify groups in an optical survey (for a review see 
e.g. Gerke et al. 2005 § 4.1; Gal 2006). In essence, these 
aim to identify overdensities in redshift space, luminosity 
and/or color space, depending on the availability of red- 
shift information and/or photometry. Whatever method 
is used, it should conform to the following general rules 
(see e.g. Gal 2006): First, it should be based on an ob- 
jective, automated algorithm to minimize human biases. 
Second, the algorithm should impose minimal constraints 
on the physical properties of the clusters to avoid selec- 
tion biases. The latter point is especially important if 
one aims to investigate the evolution of the galaxy pop- 
ulation in groups. For instance, it has been shown that 
the addition of color information provides a powerful tool 
to find clusters in the Universe. There are methods such 
as the Cluster Red Sequence (CRS) method (Gladders & 
Yee 2000) or the maxBCG algorithm (Hansen et al. 2005, 
Koester et al. 2007a) which are based on the fact that 
the most luminous galaxies in clusters inhabit a tight se- 
quence in the color-magnitude diagram called the "red 
sequence". Using the red sequence information, these 
methods have proved to be very successful in finding clus- 
ters in the local (Koester et al. 2007b) and the distant 
Universe up to redshift z ^ 1 (Gladders & Yee 2005). A 
further advantage of these methods is that no redshift in- 
formation is needed. However, clearly the requirement of 
a substantial population of red sequence galaxies inhab- 
iting the red sequence may impose a pre-selection that 

In this paper, we will not distinguish between "groups" and 
"clusters", since from an optical/near-infrared point of view the 
difference between groups and clusters is rather a gradual, quanti- 
tative one, and not a qualitative one. So when we talk of "groups", 
wc do not make any assumption about the mass or other properties 
oh these systems. 

Throughout this paper, a DM halo is operationally defined as 
a friends-of-friends group of DM particles with a linking length of 
b = 0.2, since this is the definition adopted in the Millenium DM 
Af-body Simulation (Springel et al. 2005) used for our analysis. 
So DM lialos correspond to a mean overdensity of roughly 200. 
Alternative practical definitions or higher overdensities would then, 
in principle, correspond to different group catalogues. 



makes evolutionary studies more difficult. 

The large number of accurate spectroscopic redshifts 
available for the large numbers of galaxies from the 
zCOSMOS redshift survey in the COSMOS field (Scov- 
ille et al. 2007) enables us to use the most fundamental 
signature of groups - overdensities in redshift space - 
without recourse to additional color information. Nev- 
ertheless, even with precise spectroscopic redshifts, to 
identify groups in redshift space one has to deal with cer- 
tain difficulties: Firstly, the peculiar velocities of galax- 
ies in groups elongates groups in the redshift dimension 
(the "fingers-of-god" effect). This effectively decreases 
the galaxy density within groups in redshift space, and 
thus makes them liarcier to detect, and may cause group 
members to intermingle with other nearby field galaxies 
or even to merge into another nearby group. It is al- 
most impossible to separate interlopers from real group 
galaxies if they appear within the group in redshift space. 
Second, in magnitude limited surveys such as ZCOS- 
MOS, the mean density of galaxies decreases with red- 
shift. So any algorithm based on the distance between 
neighbouring galaxies has to take into account the de- 
pendence of the mean galaxy separation with redshift. 
Third, the observational selection of galaxies (e.g. in- 
homogeneous sampling rate in the spectroscopic survey) 
frequently produce additional complications. 

To cope with these difficulties, some forms of the tra- 
ditional Priends-of-Priends (FOF) algorithm (Huchra & 
Geller 1982) are still widely used (e.g. Eke et al. 2004; 
Berlind et al. 2006), although FOF has some well known 
shortcomings (e.g. Nolthenius & White 1987; Frederic 
1995). For instance, the FOF algorithm depends sensi- 
tively on the value of the linking length, and can merge 
neighbouring groups into single big groups, or fragment 
large groups into smaller pieces. 

Until now, there have not been many spectroscopic red- 
shift surveys searching for groups at high redshift. Carl- 
berg et al. (2001) describe a group catalogue obtained 
from CN0C2 in the redshift range 0.1 < ^ < 0.5. For the 
redshift range z > 0.5 only the DEEP2 redshift survey 
(Davis et al. 2003), covering a total area of ^ 3 deg^ and 
redshift range of 0.7 < z < 1.4, has sufficient size and 
sampling rate to identify a large number of groups in 
redshift space. To achieve this, Gerke et al. (2005, 2007) 
have adapted the Voronoi-Delaunay-Method (VDM) of 
Marinoni et al. (2002), which is claimed to compensate 
for some of the shortcomings of the traditional FOF al- 
gorithm. 

The aim of this paper is to create a group catalogue 
from the - 10, 000 spectra in the zCOSAIOS 10k sam- 
ple (S. J. Lilly et al. 2009, in preparation) to enable the 
study of the group population over the redshift range 
0-1 ^ ^ ^ 1- We will compare the performance of both 
the FOF and VDM algorithms on the 10k sample, and 
try to optimize the group-finding methods by the intro- 
duction of a "multi-run procedure". In § 2 we describe 
the 10k sample and corresponding realistic mock cata- 
logues that were generated to test the group finding al- 
gorithms. § 3 gives a detailed description of our adopted 
group-finding method, and discusses the performance of 
the two groupfinders. In § 4 we present the 10k group 
catalogue, and describe how basic group properties are 
estimated. § 5 compares the 10k group catalogue to the 
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mocks and to 2dfGRS. Finally, § 6 summarizes the pa- 
per. Where necessary, a concordance cosmology with 
Ho = 73 km s^^ Mpc"\ fi^ = 0.25, and r^A = 0.75 is 
adopted. All magnitudes are quoted in the AB system. 

2. DATA 
2.1. zCOSMOS survey 

zCOSMOS is a spectroscopic redshift survey (Lilly et 
al. 2007, 2009 in preparation) covering the ~ 1.7 deg^ 
COSMOS field (Scoville et al. 2007). The redshifts 
are measured with the VLT using the VIMOS spec- 
trograph (Le Fevre et al. 2003). The zCOSMOS sur- 
vey is split into two parts: The first part, "zCOSMOS- 
bright" , is a pure magnitude selected survey with 15 < 
Iab < 22.5, Iab the F814W HST/ACS band (Koeke- 
moer et al. 2007). This magnitude limit will yield a 
survey of approximately 20,000 galaxies in the redshift 
range 0.1 < z < 1.2. Repeated observations of some 
zCOSMOS galaxies have shown that the redshift error 
is approximately Gaussian distributed with a standard 
deviation of ~ 100 km s~^. The second part of 
zCOSMOS, "zCOSMOS-deep", aims at observing about 
10,000 galaxies in the redshift range 1.5 ^ z < 3.0 se- 
lected through a well-defined color criteria. 

To date, about a half of zCOSMOS-bright has been 
completed yielding about 10, 500 spectra (S. J. Lilly et 
al. 2009, in preparation). Among these redshifts about 
15% are classified as unreliable. For the group catalogue, 
we have accepted all objects with the confidence classes 

4 and 3, 9.5, 9.3, 2.5, 9.4, 2.4, 1.5, and 1.4 (see S. J. Lilly 
et al. 2009, in preparation). The redshifts with these 
confidence classes constitute 86% of the whole 10k sam- 
ple and have a spectroscopic confirmation rate of 98.6% 
as found by duplicate observations. After removing the 
stars (~ 5%), we finally end up with a sample of 8417 
galaxies with usable redshifts ("10k sample"). 

At the current stage of the survey, the spatial spec- 
troscopic sampling rate of galaxies across the COSMOS 
field is very inhomogeneous, and there are clearly some 
linear features such as stripes visible (see Figure 5 of S. 
J. Lilly et al. 2009, in preparation). Since this will affect 
the number of detectable groups in this sample in a non- 
trivial way, we have created mock catalogues that have 
the same kind of inhomogeneous coverage. To create the 
group catalogue and generate the statistics describing the 
fidelity of the catalogue, the groupfinders were applied to 
the whole field spanning the range 149.47° < a < 150.77° 
and 1.62° ^ (5 < 2.83°. However, for some applications 
discussed below we restrict ourselves to the "central re- 
gion" of the COSMOS field defined by a = 150±0.4° and 

5 = 2.15 ± 0.4°, since this region is relatively complete 
compared to the total field. Only about 25% of the area 
has a completeness lower than 30% while for the whole 
field this area constitutes more than 50%. 

The number of galaxies per unit redshift dN^g\/dz is 
shown in Figure [TJ There are two striking density peaks 
at redshifts z ~ 0.3 and ~ 0.7. 

2.2. zCOSMOS 10k mocks 

The mocks we use to calibrate and test our groupfind- 
ers are adapted from the COSMOS mock lightcones 
(Kitzbichler & White 2007). These light cones are based 
on the Millenium DM iV-body simulation (Springel et al. 



2u I I M ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' 

\ 10k \ 

1 mocks I 




z 



Fig. 1. — Number of galaxies per redshift dN^^i/dz. The his- 
togram shows the dNg^i/dz of the 10k sample used in this paper. 
Two large over-densities at z ~ 0.3 and z ~ 0.7 are clearly visible. 
The dashed line shows the mean dN^^\/dz of the 24 mocks and the 
shaded area their scatter. As noted below, the magnitude limit 
in the mocks have been adapted such that the mean dNg^i/dz of 
the mocks matches the smoothed dNg^i/dz of HST/ACS COSMOS 
catalogue. The shaded area shows that, although COSMOS covers 
an unprecedentedly large area for a survey of this depth, cosmic 
variance is still an important issue. 

2005) which was run with the cosmological parameters 
a™ = 0.25, = 0.75, Qb = 0.045, h = 0.73, n = 1, and 
(78 = 0.9. The semi-analytic recipes for populating the 
volume with galaxies in the lightcones is that of Croton 
et al. (2006) as updated by De Lucia & Blaizot (2006). 
There are 24 independent mocks, each covering an area 
of 1.4 deg X 1.4 deg with an apparent magnitude limit of 
r < 26 and galaxies in the redshift range z 7. 

These lightcones were adjusted to resemble the real 10k 
sample as much as possible. First, a magnitude cut of 
15 < 2 < 22.5 was applied. However, the mean number of 
galaxies in the resulting mocks was about 5 — 10% higher 
than in the zCOSMOS target catalogue (i.e. a 1 — 2cr 
effect). To make the mocks more closely resemble the 
real data, we adjusted the magnitude cut in a redshift 
dependent way_so that the mean number of galaxies per 
unit redshift Ng^x^z) / dz in the mocks was equal to the 
smoothed Ngs.\{z)/dz of the zCOSMOS input target cat- 
alogue (see Figure [T]) . Then, the spatial sampling com- 
pleteness and the redshift success rate were simulated by 
removing galaxies from the mocks according to the prob- 
ability that a galaxy with a certain position and redshift 
would have been observed in the 10k sample. It should 
be noted that zCOSMOS is a slit-based survey. How- 
ever, the bias against close neighbours — already small 
because of the multiple passes (upto 8 in the central re- 
gion) across the field — is further mitigated by galaxies 
appearing serendipitously in slits targeted at other galax- 
ies (see P. Kampczyck et al. 2009, in preparation). The 
small variation in sampling rate on these small scales, 
which is anyway well below the mean intergalactic sep- 
aration in 3-d space, has been ignored in constructing 
the mocks. To further enhance the conformity with the 
10k sample, the redshift of each galaxy was perturbed 
by an amount drawn from a Gaussian distribution with 
standard deviation CTz = 100(1 + z)/c km s~^. 
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0.2<z<0.5 



0.5 < z < 0.8 



IQi^Mo < M < 5 X IQi^M, 
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Fig. 2. — Fraction of detectable groups in the "ideal" mocks as 
a function of DM halo mass. The left panel shows the redshift 
range 0.2 < z < 0.5, and the right panel shows 0.5 < z < 0.8. The 
upper panels show the number density of halos in the 10k mocks 
(blue), in a purely 22.5 magnitude limited sample (red), and in 
total (black), and the lower panels show the fraction of halos in 
the 10k sample (solid line) and in the magnitude limited sample 
(dashed line) with respect to the total number of halos at a given 
mass. The shaded regions show the upper and lower quartiles of 
the fractions among the 24 mocks. For both redshift ranges, the 
10k sample was restricted to the central region of the survey. 



3. GROUP-FINDING METHOD 



In this section the different group-finding methods and 
the statistical properties of the resulting group cata- 
logues are discussed. We have applied both the FOF and 
VDM algorithms to our sample. In this way we are able 
to compare the resulting group catalogues obtained by 
the different methods and to investigate the robustness 
of the results. 

3.1. The FOF and VDM algorithms 
3.1.1. FOF 

The FOF algorithm is adopted from Eke et al. (2004). 
It has three free parameters: the linking length b, the 
maximum perpendicular linking length in physical coor- 
dinates Lmax, and the ratio between the linking length 
along and perpendicular to the line of sight R. The exact 
meaning of these parameters becomes clear by regarding 
the linking criteria: Consider two galaxies i and j with 
comoving distances di and dj respectively. These two 
galaxies are assigned to the same group if their angular 
separation 0ij satisfies 



1 fl^ 



(1) 



and, simultaneously, the difference between their dis- 
tances satisfies 



2.3. Detectability of groups 

Since, according to our definition, a group is the set of 
galaxies occupying the same DM halo, we can only hope 
to detect those groups which host at least two galaxies 
in the 10k sample. The collection of all these "detectable 
groups" constitutes the ideal (or "real" ) group catalogue. 
This is the best catalogue that can be produced with the 
10k sample, and this is the catalogue we aim to recon- 
struct with our groupfinder. Any DM Halo hosting only 
a single zCOSMOS 10k galaxy is not detectable and the 
corresponding galaxies will be termed "field galaxies". 
For this reason, even the ideal group catalogue that is 
detectable with the 10k sample will not be a complete 
rendition of the true underlying group population in the 
COSMOS volume. Nevertheless, whenever we discuss 
the statistical properties of a group catalogue, such as 
completeness or purity (see § 13. 2|) , these will be measured 
relative to this "ideal" group catalogue, rather than the 
underlying population. 

In a flux limited survey such as zCOSMOS, the popu- 
lation of galaxies that is observed changes with redshift, 
and the same will also therefore be true of the groups. 
For instance, for a group to be detectable at high red- 
shift, it has to host at least two rather bright galaxies. 
Figure [2] shows the fraction of detectable groups in the 
mocks (i.e. in the "ideal" catalogue in the previous para- 
graph) as a function of the halo mass in the two red- 
shift bins 0.2 < z < 0.5 and 0.5 < z < 0.8. While 
in the lower redshift bin the sample should be complete 
down to ^ 5 X 10^'^ M0, this limit increases in the higher 
redshift bin to ^ 2 x However, in both bins 

the bulk of the detectable halos are in the mass range 



d,\ < 



(2) 



l± and Z|| are the comoving linking lengths perpendicular 
and parallel to the line of sight defined by 



/i =1 



-^inax(l + ^)j — 



/3 



/|l — R l±, 



(3) 
(4) 



where n is the mean density of galaxies. Since the sam- 
ple of galaxies is magnitude limited, the mean density of 
galaxies decreases with redshift leading to a steady in- 
crease of the mean inter-galaxy separation with redshift. 
Eke et al. (2004) argued that scaling both l± and l\\ with 

n~^/^ will compensate for the magnitude limit and lead 
to groups of similar shape and overdensity throughout 
the survey. The free parameter Lmax has been intro- 
duced to avoid unphysically large values for l± at high 
redshifts where the galaxy distribution is sampled very 
sparsely. Since Lmax is measured in physical coordinates, 
-^max(l -\- z) is the maximal comoving linking length per- 
pendicular to the line of sight. Finally, the free parameter 
R allows /|| to be larger than l±_ taking into account the 
elongation of groups along the line of sight due to the 
fingers-of-god effect. 

3.1.2. VDM 

The VDM algorithm was adopted from Gerke et al. 
(2005) which was itself based on the method developed 
by Marinoni et al. (2002). This algorithm is more 
complicated than the FOF and has six free parameters 
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instead of three. Basically one needs a full Voronoi- 
Delaunay tesselation^'^ of the input galaxy sample and 
the volumes of each Voronoi-cell. The Voronoi-Delaunay- 
tesselation was computed using Qhull^* (Barber, Dobkin, 
& Huhdanapaa 1996) and the volumes of the Voronoi- 
cells using the algorithm of Mirtich (1996). 

The VDM algorithm can be divided into 3 phases: In 
Phase I, the galaxies are ordered in ascending order of 
Voronoi-volumes. Then, the first galaxy in this sorted 
list is taken as a "seed galaxy" and a cylinder of radius 
Ri and length 2Li using comoving coordinates is placed 
around it such that the axis of the cylinder is directed 
along the line of sight. If there is no other galaxy inside 
this cylinder, the "seed galaxy" is regarded as a field 
galaxy and one proceeds to the next galaxy in the list. 
If, however, there are other galaxies within the cylin- 
der. Phase II starts. In this phase, a second cylinder 
with radius Ru and length 2in is defined and all galax- 
ies inside this second cylinder directly connected to the 
seed galaxy or to its immediate Delaunay-neighbours by 
means of the Delaunay-mesh are assigned to the same 
group. The number of galaxies inside the second cylin- 
der A'li is taken as an estimate of the central richness of 
the group. In Phase III, a third cylinder with radius 



i?iii = r(iVii)i/3 
iiii-/(iV„)i/3/(^) 



(5) 
(6) 



is defined, whereas r and / are two free parameters, Nu is 
the central richness corrected for the redshift dependent 
mean density n(z), and f{z) is a function introduced to 
take into account that for a fixed velocity dispersion the 
length of the fingers of god in redshift space is a function 
of redshift. A^n and f{z) are given by 



iVr, 



n{z) 



Nu 



and 



f{z) = -, s{z) 

S[Z^cl) 



l + z 



+ Zf + nt^ 



(7) 



(8) 



respectively, where Zj-of is an arbitrary reference redshift 
chosen to be 0.5. In this third phase, again all galax- 
ies within the third cylinder are assigned to the current 
group. After fixing Zref, 6 free parameters Li, Ru, 
Lii, r, and I remain. The reader is referred to Table [T] 
for typical parameter-sets for the two group-finders. 

It can be seen that both group-finding algorithms are 
somewhat arbitrary and neither is directly inked to the 
physical basis of a group, namely virialized motion within 
a common potential well. While it seems that the VDM 
algorithm is at least partly motivated by certain scaling 
relations for groups (Gerke et al. 2005), this is at the 
expense of simplicity which is clearly the mark of the 
FOF algorithm. 

For a given set of sites in space, the "Voronoi-cell" of a certain 
site consists of all points closer to this site than to any other site. 
Furthermore, two sites whose Voronoi-cells share a common in- 
terface are called "Delaunay-neighbours" . The "Voronoi-Delaunay 
tesselation" for a given set of sites is the complete set of all its 
Voronoi-cells and Delaunay-neighbours. For more formal defini- 
tions and basic properties of Voronoi-Delaunay tesselations we refer 
to basic textbooks of geometry. 

http://www.qhull.org 



3.2. Basic statistical quantities 

In order to assess the performance of a groupfinder, re- 
alistic mock catalogues containing full information about 
the underlying DM halos and their properties are needed. 
In this section, we introduce some useful statistics to 
characterize the overall fidelity of the resulting group cat- 
alogues. 

The fidelity of the group catalogue can be assessed 
through comparing the "reconstructed" groups, obtained 
by running the groupfinder on the mock catalogues, to 
the "real" group catalogue described above - i.e. the set 
of all DM haloes in the mocks that contain, after the 10k 
selection criteria have been applied, at least two galax- 
ies. The comparison is therefore of two identical point 
sets, the galaxies in the mocks, whose points are grouped 
together in possibly different ways. This is schematically 
illustrated in Figure [31 

We follow here the definitions and notations of Gerke 
et al. (2005). The two big circles constitute two group 
catalogues. Each point corresponds to a galaxy of the 
input galaxy sample and the encircled galaxies belong 
to the same group. In the left-hand catalogue are the 
"real groups" as given by the DM halos in the simulation, 
while in the right-hand catalogue are the "reconstructed 
groups" as identified by our groupfinder. Some sort of 
measure is needed of how many reconstructed groups can 
be identified with real groups and how many real groups 
are recovered by our groupfinder. Following Gerke et al. 
(2005), we define the following terms: 

Association: A group i is associated to another group 
j if group j contains more than the fraction / of 
the members of group i. For this association to 
be unique, it must hold / > 0.5. Throughout this 
paper, we set / = 0.5 as did Gerke et al. (2005). 

One-way-match: If group i is associated to group j, but 
group i is not associated to group i (illustrated by 
an arrow from group i to group j). 

Two-way-match: If group i is associated to group j and 
vice versa (illustrated by a double-arrow). 

While each group can only have a single, unique associ- 
ated group (i.e. an arrow pointing away), it might well 
happen that a certain group is the associated group for 
many other groups (i.e. many arrows pointing toward 
it). We therefore have the following terminology: 

Over-merging: If more than one real group is associated 
to the same reconstructed group. 

Fragmentation: If more than one reconstructed group 
is associated to the same real group. 

Spurious group: A reconstructed group which has no 
associated real group. 

Undetected group: A real group which has no associated 
reconstructed group. 

Group galaxy: A galaxy which belongs to a group. 

Field galaxy: A galaxy not associated to any group. 
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Real group catalogue Reconstructed group catalogue 



fragmentation 




Fig. 3. — Schematic illustration of comparing a reconstructed group catalogue to a real group catalogue as obtained from DM simulation. 
The left big circle constitutes the real group catalogue and the right big circle the reconstructed group catalogue. Each point displays a 
galaxy and the encircled points inside the big circles constitute groups. A group in the real (reconstructed) catalogue may be associated to 
a group in the reconstructed (real) catalogue (see the text for details). Such an association is indicated by an arrow pointing from the real 
(reconstructed) group to the reconstructed (real) group. If there is an arrow pointing from one group to another and also an arrow pointing 
backwards, such an association is termed a "two-way-match". Otherwise it is just a "one- way-match" . If more than one reconstructed 
group points to the same real group this is called "fragmentation" , if there is more than one real group associated to the same reconstructed 
group, this is called "over- merging" . 
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Optimal multi-run parameter-sets for FOE and VDM 
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Note. — The definitions of these parameters are given in § 13.11 
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With this terminology, the following statistical mea- 
sures can be defined that together describe the overall 
fidelity of the reconstructed group catalogue and thus 
its potential usefulness for quantitative analysis. Let 
iVg5r^'(iVi.eai) denote the number of real groups with iVrcai 
members, and N^^^{N^ec) the number of reconstructed 
groups with A^rcc members. Then by 



^[7V|r'(^rcal) 



(9) 



we denote the number of associations of real groups with 
A'rcai members to reconstructed groups with A^roc mem- 
bers. In the same way. 



(10) 



denotes the number of associations of reconstructed 
groups with N^cc members to real groups with A'rcai 
members. The analogue notations for the numbers of 



two-way-associations are 

^[Ar--l(iV,eal) ^ iVgT(^rcc)] (H) 
yl[iV^r(^^rec) ^iV^r'(^^real)]. (12) 

Note that the last two expressions are equivalent to each 
other. Then, with these notations we can formally intro- 
duce the "one-way completeness" ci{N) and the "two- 
way completeness" C2{N) by 



(13) 



(14) 



Analogously, we define the "one-way purity" pi {N) and 
"two-way purity" p2 (N) as 



PliN): 



iV£°'=(>iV) 



(15) 
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P2{N) = 



A[N12''{>N) ^ A^'r'(>2) 



/Urec 



{>N) 



(16) 



The one-way "completeness" ci{N) is a measure of the 
fraction of real groups with N or more members that 
are successfully recovered in the reconstructed group cat- 
alogue, and the one-way "purity" pi[N) is a measure 
of the fraction of reconstructed groups with N or more 
members that belong to real groups. The higher ci{N) 
the smaller the fraction of undetected groups (1 — ci(JV)), 
and the higher pi{N) the smaller the fraction of spuri- 
ous groups (1 — pi{N)). On the other hand, the smaller 
the ratios C2{N)/ci{N) or p2{N)/pi{N) the more over- 
merging or fragmentation, respectively, is present. By 
definition the four quantities ci{N), C2{N), pi{N), and 
P2{N) all take only values between and 1. 

While Gerke et al. (2005) have introduced these four 
quantities ci, C2, pi, and p2 globally for a group cata- 
logue including all groups, we have defined them to be 
functions of the number of members N ("richness"). It 
will become clear below that investigating these statistics 
as a function of N is very useful for improving the perfor- 
mance of a group catalogue. Note that the argument N 
always means "for groups with N or more members" as is 
clear from their definitions, so ion N = 2 the two defini- 
tions are identical. Throughout this paper we will always 
consider the set of groups down to a given richness-class 
N. So this convention eases the notation. It would, how- 
ever, be straightforward to define the analogue quantities 
in a non-cumulative way. 

While ci{N), pi{N) etc. are statistical quantities on a 
group-to-group basis, statistical quantities on a galaxy- 
to-group basis may be useful as well. Therefore, following 
Gerke et al. (2005), we define the "galaxy success rate" 
Sga\{N) and the "interloper fraction" fi{N) as 



^gal(A^)-^^ 



(17) 
(18) 



where Sf^li{N) is the set of galaxies associated to real 
groups of N members, Sf^^{N) the set of galaxies asso- 
ciated to reconstructed groups of N members, and S^'^l^ 
the set of real field galaxies. The square brackets [.] here 
denote the number of elements in a set and the H is the 
usual intersection from set theory. Thus galaxy success 
rate Sgai{N) is just the fraction of galaxies belonging to 
real groups of richness >A'^ that have ended up in any 
reconstructed group, and the interloper fraction fi{N) is 
the fraction of galaxies belonging to reconstructed groups 
of richness > that are field galaxies ("interlopers"). 
Like ci(7V), pi{N), etc., 5gai(iV) and /i(iV) will also take 
values between and 1. 

It is well known (e.g. Frederic 1995, Gerke et al. 2005) 
that a perfect reconstructed group catalogue is impossi- 
ble to achieve and furthermore, that completeness and 
purity tend to be mutually exclusive. As would be ex- 
pected, the higher the completeness, the lower the pu- 
rity, and vice versa (see Figure S]) . There is also a simi- 
lar dichotomy between over-merging and fragmentation. 
Therefore, we introduce additional measures of "good- 



ness" which combine the statistics such as completeness 
and purity in a way that maximizing (or minimizing) 
them yields a sort of "optimal" group catalogue. We for- 
mally define as (omitting the dependence of iV for the 
sake of clarity) : 



v/(l-Ci)2 + (l-pi)' 



51 = 

C2 P2 

52 = 

Cl Pi 



53 = ^(1- W + 



(19) 
(20) 

(21) 



The meaning of these quantities is as follows: Since a 
perfect group catalogue features (ci,pi) = (1, 1), i.e. en- 
tirely complete and absolutely pure, the reconstructed 
group catalogue should come as close as possible to this 
point in the ci-pi-plane. So gi gives the distance to this 
optimal point in the ci-pi-plane and thus is a measure 
of the balance of completeness and purity. Then, a good 
group catalogue should exhibit ci ~ C2 and pi ~ p2 
meaning that essentially no over-merging and fragmen- 
tation is present in the catalogue. Hence, 172 measures 
the balance between over-merging and fragmentation and 
should also approach 1. Finally, 173 is similar to gi but 
is on a galaxy-to-group basis instead of a group-to-group 
basis. As is clear from their definitions, these measures 
of goodness again take only values between and 1. It 
is clear that gi and gs should be minimized, while g2 
should be maximized. 

3.3. Optimization strategy 

Since there exists no single perfect reconstructed group 
catalogue, one has to optimize the group-finding param- 
eters, in principle, in a way that the resulting group 
catalogue serves as well as possible the intended scien- 
tific purpose. However, as we will see, there seems to 
be a rather natural way to construct a group catalogue 
which is useful for many different purposes. The only 
way to find such optimal parameters of a groupfinder is 
to run it on the mocks for different parameter-sets, and 
to compare the resulting group catalogues by means of 
the statistics introduced in the previous section. 

The completeness Ci (8) and purity pi (8) of the recon- 
structed group catalogues, after running FOF and VDM 
over a large parameter space, are shown in Figure |4l It 
is obvious that the points do not extend arbitrarily close 
to the right upper corner (i.e. the perfect group cat- 
alogue). The parameters Ci(8) and pi{8) are in some 
sense anti-correlated. In fact, the cloud of points seem 
to feature a boundary toward high completeness and pu- 
rity beyond which there is a region totally free of points. 
It is notable, how similar this boundary is for FOF and 
VDM approaches — clearly neither is markedly superior 
to the other. The same holds for the g2(8)-goodness, 
color coded in the figure, along this boundary region. 
These similarities between FOF and VDM are observed 
for all richness classes N. This indicates that this bound- 
ary is probably the limit of what can be achieved with a 
zCOSMOS-lOk-like sample and does not depend on the 
choice of algorithm. This also suggests that the choice 
of a particular groupfinder such as FOF or VDM is less 
important than sometimes argued, although, as we will 
see, the properties of group catalogues obtained using 
the two groupfinders are not absolutely identical. 
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(8) (8) 

Fig. 4. — Distributions of parameter-sets in the ci (8)-pi (8)-plane for a wide range of group-finding parameters. In the left panel are the 
parameter-sets for FOF and in the right panel those for VDM. Each parameter-set is positioned at the average value for the 24 separate mock 
catalogues. The parameter-sets are color coded by the goodness parameter 92(8) indicating the degree of over-merging or fragmentation. 
The dotted line is the largest circle around the upper right corner being empty of points, i.e. the radius of this circle is equal to the smallest 
31 (8) value. The best gi(8) parameter-set is marked by a diamond and the error bars exhibit the scatter among the 24 mocks for this 
particular parameter-set. The labeled black points show the sites where the best gi{N) sets for different reside on this plane, A'^ being 
denoted by the label of the points. Although these best sets inhabit, in general, very different places, they converge for A'^ > 8, at least for 
FOF. The position of the best 32(8)-set is marked by a triangle and the one of the best 33(8) by a square. 

the corresponding N. For FOF, the best gi(8)-set is 
optimal for all > 8 as well, while for N < 8 the optimal 
gi{N)-sets reside at lower completeness. For VDM this 
is less obvious, but at least for iV > 10 the best gi{N)- 
sets seem to converge. In any case, it is clear that it 
is not possible to simultaneously optimize gi{N) for all 
N with a single parameter-set. If the parameter-set is 
optimized for groups with N > 8, the resulting group 
catalogue is very complete for groups with N < 8 but 
the purity starts to decrease severely for < 5, and 
a lot of spurious small groups enter the catalogue (see 
Figures [S] and ini) . Since around ~ 80% of the groups 
have iV < 5, this is unsatisfactory. 

This suggests that the groupfinder should be run sev- 
eral times with different parameter-sets, each time opti- 
mized for a different richness range. This is analogous 
to the "hot-cold" double pass approach often used with 
image detection algorithms such as SExtractor. We will 
refer to this approach as the "multi-run procedure" , and 
it was implemented it as follows: 

1. The parameter-set is optimized for the range N > 
6, the groupfinder is run, and only those groups 
that are in this richness range are kept in the group 
catalogue. 

2. The parameter-set is then optimized for groups 
with = 5, the groupfinder is run again, and only 
groups with N = 5 that are not yet detected in the 
first step are added to the group catalogue. 

3. Repeat the previous step for = 4. 

4. Repeat the previous step for A^ = 3. 

5. Repeat the previous step for A^ = 2. 

In each step, only those groups are accepted which have 
not been found in an earlier step. It is better to work 
down in richness because the richer groups are more eas- 
ily detected. The optimal parameter set in each step 



VDM, much more than FOF, also exhibits some scat- 
ter in the range given by 0.5 < ci(8) < 0.85 and 
Pi{8) > 0.65. The existence of such parameter-sets is 
a natural side-effect of the relatively large number of 
free parameters of the VDM groupfinder resulting in 
many parameter combinations with obviously subopti- 
mal properties in terms of ci(8) and pi(8). The extent 
of this scatter, of course, also depends strongly on the ex- 
plored range of values in the parameter space. Since we 
are interested in parameter-sets yielding simultaneously 
high completeness and high purity, we will only focus on 
the boundary mentioned above. 

The challenge is to find the best group catalogues 
among those plotted in Figure |4l making the best com- 
promise between ci and pi . A natural choice is the point 
that lies closest to (ci,pi) = (1,1) indicated by the di- 
amond. According to equation (jl9p . this is the point 
where gi is minimal. We will refer to this parameter-set 
as the "best gi-set" . It defines a circle around the upper 
right corner (dotted line) that is entirely empty of points. 

In addition to minimizing gi, one would prefer, of 
course, to simultaneously maximize 32 and minimize g^. 
In general, the best parameter-sets for these three good- 
nesses will not coincide. Rather it turns out that the best 
(j(2-set lies usually at slightly higher completeness relative 
to the best gi-set (see triangles in Figure m, while the 
best 53-set lies usually at slightly lower completeness (see 
squares in Figure S]). However, as is clear from Figure 
01 the gradient of g2 is rather shallow around the best 
gi-set and nearly maximal, so that the precise site of the 
optimal g2-set is not that important. The same holds for 
the gradient of 33. Finally, it seems that the best gi-set 
is a good choice. 

3.3.1. Multi-run procedure 

Since ci{N), C2{N), etc. are functions of richness A^, 
one might wonder how the best gi(A^)-sets for different 
TV are distributed in the ci(8)-pi(8)-plane. This is shown 
in Figure |4] by the labeled points where the labels denote 
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Fig. 5. — Comparison of the completeness and purity obtained 
from a single run and from the multi-run-procedure. The left two 
panels show the statistics for FOF and the right two panels for 
VDM. In each panel, the blue color corresponds to the single run 
and the red color to the multi-run-procedure. In the upper two pan- 
els, the solid lines display the one-way completeness ci , and in the 
lower two panels they show the one-way purity p2- In each panel, 
the dotted lines display the corresponding two-way-quantities being 
C2 or p2. It is shown that the purity obtained from the multi-run- 
procedure is more balanced than that from the single run. For 
FOF this leads also to a more balanced completeness. 
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Fig. 6. — Relative abundance of reconstructed groups as com- 
pared with the real groups as a function of richness A^. The green 
line shows the mean relative abundance of single-run FOF groups, 
the blue line the mean relative abundance of multi-run FOF groups, 
and the red line the mean relative abundance of one-way-matched 
groups. The errorbars always exhibit the scatter among the 24 
mocks. The gray shaded region displays the spread of the relative 
abundance of real groups among the 24 mocks (i.e. cosmic vari- 
ance plus shot- noise). For N < 6 the number of multi-run FOF 
groups is slightly too high and exceeds the margin of cosmic vari- 
ance while the abundance of the one-way-matched groups is well 
within the region dominated by cosmic variance. For comparison 
the relative abundance of multi-run- VDM groups is shown as well 
(black dotted line). 

is basically just the best 51-set for the corresponding 
richness range. However, particularly in the first step, 
also other choices are possible. In fact, for VDM, we 
have chosen a special set for the first step since the best 
gi(6)-set proved to be by no means optimal for N > 8. 
Table [T] gives the optimal parameter-sets for FOF and 
VDM. Since there are some degeneracies between the 
parameters, there are no simple trends from step 1 to 
step 5 for the single parameters. 
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Fig. 7. — Comparison between the multi-run FOF and the multi- 
run VDM catalogues. The upper panel shows the fraction of FOF 
groups associated to VDM-groups. The red corresponds to the 
real data 10k group catalogue, whereas the solid line designates 
one-way-matches, and the dotted line two-way-matches. The black 
solid line corresponds to the mean fraction of associations in the 
mocks, if groups with Af > 20 are omitted, and the error bars 
exhibit the scatter among the 24 mocks. The black dotted line 
shows the same, if all groups are taken into account. In the lower 
panel, the symbols have identical meaning but exhibit the fraction 
of VDM groups associated to FOF groups. 

Figure [S] shows how the multi-run-procedure compares 
to the single run best (;i(8)-set. In the case of FOF, 
the completeness has slightly decreased for TV < 5 com- 
pared to the single run, but the high completeness of the 
single run in this richness range comes at the cost of a 
low purity. In fact, for the multi-run, the purity has in- 
creased for iV < 5, and has become almost constant for 
all richness classes. Thus, the overall behaviour of the 
completeness and purity is now more balanced. 

For VDM, we observe a similar trend. Here, it is par- 
ticularly evident that in a single run, even if optimized 
for > 8, the completeness decreases for N > 6. The 
multi-run-procedure can correct for this and still increase 
the purity for small groups. 

While the overall statistics of the two multi-run- 
catalogues are similar, there are some minor differences: 
The overall behaviour of the completeness and purity as 
a function of N seems to be more balanced for FOF. Also 
the ratios C2/C1 and P2/P1 are more balanced for FOF, 
while for VDM, 02/01 increases and P2/P1 decreases to- 
ward higher N. On the other hand, the total number of 
groups found with FOF is too high for N < 3, while for 
VDM, the number of reconstructed groups is too high for 
> 5 (Figure H]). All things considered, the multi-run- 
procedure works better with FOF than with VDM. 

3.3.2. Combining FOF and VDM 

With the FOF and VDM multi-run group catalogues, 
there are now two catalogues available, obtained by dif- 
ferent algorithms, and exhibiting similar purity and com- 
pleteness. A comparison of the two catalogues on a 
group-to-group basis is shown in Figure [71 The red lines 
show the result for the real 10k sample. An FOF group 
with N > 2 has a probability of being associated with a 
VDM group of ~ 80%, increasing roughly linear with 
until it reaches 100% for N > 10. On the other hand, 
the probability of any VDM group being associated with 



10 



Knobel et al. 



a FOF group is greater than ~ 80%, and even higher 
than - 90% for iV > 8. The reason that for TV < 4 
the VDM groups have a higher probabihty of being as- 
sociated with the FOF groups than vice versa is due to 
the excess production of smah groups in the FOF cat- 
afogue. Furthermore, note that whenever a group with 
> 6 has an associated group this association is a two- 
way-association. Thus, the two catafogues, though not 
identical, contain mainly the same structures. Moreover, 
the real data agree very well with the mocks (black solid 
lines) , if groups with > 20 in the mocks are omitted (in 
the mocks there are too many of them, see § 15. 1|) . This 
shows that the groupfinders work indeed comparably on 
the real data as they do on the mocks. 

Is there a way to combine the information in the two 
catalogues in order to obtain a single optimal group cata- 
logue? It seems natural to consider those group galaxies 
that were recovered by both groupfinders. We introduce 
a "galaxy purity parameter" (GAP) for each galaxy. The 
GAP is a flag indicating if a certain group galaxy is con- 
tained simultaneously in both catalogues. For a certain 
FOF group galaxy it is defined as follows: 

• If there is no VDM group containing this galaxy, it 
gets a GAP equal to 0. 

• If it is also contained in a VDM group, and the FOF 
group has a one-way-match to this VDM group or 
this VDM group exhibits a one-way-match to the 
FOF group, the galaxy gets a GAP equal to 1. 

• If it is contained in a VDM group, and the FOF 
group has a two-way-match to this VDM group, 
the galaxy gets a GAP of 2. 

Thus, we expect that the higher the GAP for a galaxy, 
the more reliable the detection, and the higher the prob- 
ability that this galaxy is a real group galaxy and not 
an artefact introduced by one of the groupfinders. The 
GAP is a useful fiag for excluding uncertain group mem- 
bers if needed, and defines more clearly the reliable core 
of a group. 

Then, we can define two sub-sets, or "sub-catalogues" , 
of the basic FOF catalogue. The "one-way-matched" 
(IWM) sub-catalogue contains only FOF group galax- 
ies with a GAP > 1. In a similar way, the "two- 
way-matched" sub-catalogue contains only group galax- 
ies with a GAP = 2, i.e. all galaxies with a GAP < 1 
become field galaxies. Note that we have defined the 
GAP, and thus the IWM and 2WM, based on the FOF 
groups. They could, of course, also be defined based 
on the VDM groups. However, to obtain a single opti- 
mal group catalogue, we have to choose between FOF 
and VDM. As discussed in the last section, though the 
multi-run catalogues obtained by these groupfinders ex- 
hibit similar statistics, some (minor) properties are over- 
all better for FOF. So we have decided the FOF cata- 
logue to be the basic catalogue. The VDM catalogue, by 
contrast, is therefore only used to determine the GAPs 
of FOF group members. Since the two sub-catalogues 
preserve the group structure of the basic FOF catalogue, 
this set of three group catalogues can be presented as one 
single big catalogue with the GAP fiags to indicate the 
increasing purity. 



TABLE 2 

Catalogue statistics for N > 5 



catalogue 




Pi'' 


C2/ci 


P2/P1 




f d 

h 


FOF 
IWM 
2WM 


0.85 
0.81 
0.77 


0.78 
0.82 
0.83 


0.92 
0.93 
0.95 


0.92 
0.92 
0.92 


0.87 
0.81 
0.72 


0.19 
0.17 
0.17 



Note. — The preci se d efinitions of the statistical 
quantities are given in 13.21 

One-way completeness 
^ One-way purity 

Galaxy success rate 

Interloper fraction 



3.4. Results on the mocks 

In this section, we will summarize our findings and give 
a detailed statistical description of the FOF catalogue 
with its two sub-catalogues (IWM and 2WM). 

The statistics of the merged catalogues in comparison 
with the reference FOF catalogues is shown in Figure [8] 
and for TV > 5 in Table El The lines exhibit 

the mean among the 24 mocks and the errorbars their 
scatter. The FOF basis catalogue has a completeness 
ci ~ 0.85 almost not depending on the richness N and 
a purity pi ~ 0.78 only weakly depending on N. Only 
for TV = 2 there is a significant decrease in both com- 
pleteness and purity. The corresponding statistics for 
the IWM and 2WM sub-catalogues have almost identi- 
cal dependences on N but, as expected, their ci is lower 
and pi higher. 

It can be seen that the gain of the 2WM catalogue 
compared to IWM in terms of both purity or interloper 
fraction is much smaller than the deficit in terms of com- 
pleteness and galaxy success rate. This indicates that by 
keeping only group galaxies with a GAP — 2, many real 
group galaxies are removed, but only a relatively small 
number of interlopers are eliminated. By contrast, the 
gain in purity of the IWM with respect to the reference 
FOF is quite comparable to the associated decrease in 
completeness. Thus, while the IWM catalogue is a use- 
ful construction, little is gained by the more restrictive 
2WM catalogue. In the remainder of this paper, we will 
mainly refer to the FOF and its IWM sub-catalogue. We 
note that not only do the ratios C2/C1 and P2/P1 behave 
well as a function of N for the three catalogues, but also 
C2/C1 ~ p2/pi. This means that the contributions of 
over-merging and fragmentation are not only small, but 
are also well-balanced. 

So far, we have considered the statistics averaged over 
the whole redshift range, i.e. 0.1 z < 1. In Figure 
[Ql the completeness (blue line) and the purity (red line) 
of the FOF catalogue are shown as functions of redshift 
for several richness classes N. The curves are consistent 
with a relatively constant completeness and purity with 
redshift. Only the highest redshift bins for < 4 show 
possibly a slight decrease. This emphasizes further the 
robustness of our catalogue. Figure [10] shows how the 
galaxy success rate S'gai and the interloper fraction /i 
behave as a function of the normalized projected distance 
from the group centers. 

The distance variable r is defined for each group galaxy 
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Fig. 8. — Statistics of the FOF and its two sub-catalogues, WMl and WM2, as function of richness TV. For all panels, blue refers to the 
FOF groups, red to the one-way-matched groups and green to the two-way-matched groups. The errorbars show the scatter among the 24 
mocks. The upper left panel exhibits completeness and the upper right panel purity. The solid lines correspond to ci and pi respectively 
and the dashed line C2 and P2 respectively. The lower right panel shows the galaxy success rate Sgai and the lower left panel the interloper 
fraction /j. 
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Fig. 9. — Completeness and purity of the FOF groups as a func- 
tion of redshift for 8 different richness classes. In each panel, the 
blue solid line corresponds to the mean ci-completeness and the 
red solid line to the mean pi -purity, whereas the errorbars exhibit 
the scatter among the 24 mocks. The dashed lines are for the cor- 
responding 2-way-quantities, respectively. The richness class is 
indicated in each panel. 
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Fig. 10. — Behaviour of the galaxy success rate Sgai and the 
interloper fraction /j as a function of the normalized projected dis- 
tance r from the group centers, where r is defined in Equation 1221 
The left lower panel shows the galaxy success rate Sgai, where the 
blue line corresponds to the FOF and the red line to the IWM cat- 
alogue. The left upper panel shows the distribution of real group 
galaxies as a function of separation from the cluster centers. It is 
clear that at r < 1.5, where most real group galaxies reside, S^^i 
is > 0.9 for FOF groups and only slightly lower for IWM groups. 
The right lower panel exhibits interloper fraction /j and the right 
upper panel the distribution of galaxies in reconstructed groups as 
a function of r, whereas blue corresponds to FOF and red to IWM. 
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as 




where is its separation from the group center in a, 
and A^ia the second moment in a among all members of 
this group. Similar definitions hold for 0doc and A^dcc- 
Only groups with 3 or more members are taken into ac- 
count, since for groups with only 2 members r becomes 
meaningless. 

The left lower panel shows the galaxy success rate S'gai 
as a function of r from the real group centers. As one 
would expect, it increases toward the group centers. For- 
tunately, the group centers are also the region where most 
of the real group galaxies reside (left upper panel) . Note 
that S'gai can decrease, in principal, in two ways: First 
of all, by failing to identify certain real group galaxies in 
successfully detected real groups, and second, by failing 
to detect a real group at all. The small deficit in S'gai for 
r < 1 is due to the second reason, while the first reason 
becomes more important with increasing r. 

In the right lower panel, the interloper fraction /i is 
plotted as a function of r, where r is now related to the 
centers of the reconstructed groups. As expected, the in- 
terloper fraction shows the opposite behaviour as a func- 
tion of r. However, the difference in fi between near and 
far galaxies from the group centers is less strong than for 
Sgai. For small r, the most important contribution to fi 
comes from spuriously detected groups with 3 members. 

Finally, Figure |6] shows the numbers of reconstructed 
groups relative to the number of real groups. As was al- 
ready mentioned, the mean difference between the num- 
ber of reconstructed FOF groups and the number of real 
groups exceeds the uncertainty expected by cosmic vari- 
ance from mock to mock for N < 5, while the groups of 
the IWM sub-catalogue are well within this region. 

According to the statistics discussed in this paragraph, 
particularly in Figur^ it became clear that the FOF 
group catalogue along with its IWM sub-cataloge has 
the potential to be useful for many different applications 
such as galaxy evolution studies, group statistics, or grav- 
itational lensing. For example, if one aims to study the 
evolution of galaxies in groups, a high purity and a low 
interloper fraction are desirable, so the IWM catalogue 
is probably appropriate. On the other hand, in order to 
have a relative pure sample of field galaxies, galaxies not 
contained in the basic FOF catalogue should be selected. 
Generally, it holds that whenever small groups, number 
of groups, or purity of the group sample is important, the 
IWM catalogue is to be preferred to the FOF catalogue. 

3.5. Comparison with DEEP2 

For DEEP2, Gerke et al. (2005) optimized their VDM 
groupfinder in order to obtain the correct number of re- 
constructed groups N^^^{(j, z) as a function of velocity 
dispersion a and redshift. As result, they present two 
group catalogues: an "optimal" catalogue and one with 
maximized purity. Since Gerke et al. (2005) did not treat 
completeness and purity as a function of richness N , all 
their statistics correspond to > 2. 

The statistics for their optimal parameter set are ci = 
0.782 ± 0.006, pi = 0.545 ± 0.005, Sgai = 0.786, and 
/i — 0.458 ± 0.004. The ratios between the two-way and 



the one-way-quantities are therefore C2/C1 — 0.919 and 
Vilvi = 0.987. So in comparison with our own FOF N > 
2 statistics, their completeness ci and galaxy success rate 
Sgai are 3% and ^ 6% lower, respectively, while their 
purity pi is ^ 17% lower, and their interloper fraction /i 
^ 56% higher. 

We conclude that, compared with the DEEP2 "opti- 
mal" group catalogue, the performance of our FOF group 
catalogue is very high. Moreover, it would be very inter- 
esting to compare the statistics for the higher richness 
classes as well. Since Gerke et al. (2005) optimized 
their catalogue using all groups with N > 2 their cat- 
alogue should be optimal regarding the N > 2 statistics. 
But, in contrast to a multi-run catalogue, this might not 
be the case for the higher richness statistics, since the 
N > 2 statistics are actually dominated by 2-member 
groups being by far the most abundant. This suggests 
that the relative superiority of our FOF catalogue over 
the DEEP2 catalogue could be even higher for the higher 
richness classes. 

4. THE REAL DATA lOK GROUP CATALOGUE 

In this section, the real data 10k group catalogue is pre- 
sented. It is given by means of the Tables |3] and S) Table 
[3] is a list of all groups along with their properties, and 
Tabled is a list of all group galaxies. The group galax- 
ies are associated to their group by means of the unique 
group- ID. The galaxy-IDs refer to the 10k catalogue pub- 
lished by S. J. Lilly et al. (2009, in preparation). 

4.1. Group purity parameter 

Since we are presenting the FOF catalogue along with 
its two sub-catalogues, defined by the GAP parameter in 
the final column, any group property can, in principle, 
be calculated for all three catalogues. For instance, it is 
possible to assign to each group three observed richnesses 
N. To avoid confusion and to keep the discussion simple, 
all group properties given in Table [3] correspond to the 
basic FOF catalogue. In order to quantify the number 
of IWM galaxies in a certain group, we introduce the 
group purity parameter (GRP^) for i — 1,2, defined by 
the fraction of FOF members having a galaxy purity pa- 
rameter GAP > i. For i = 1 this is the fraction of FOF 
members that are also IWM members, and for i = 2 that 
are also 2WM members. Note that if the GRPi is zero, 
then there is no association between the FOF group and 
a VDM group. 

The statistics of the number of groups and the GRPi 
are summarized in Table [51 The basic FOF catalogue 
contains 800 groups with N > 2, 102 groups with N > 5, 
and 23 groups with > 8. Over 80% of the groups with 
N > 2 have a GRPi greater than zero, i.e. these groups 
have at least one group galaxy that was independently 
recovered by both FOF and VDM. For the groups with 
N > 5, the number of groups with GRPi > rises to 
95%, and for those with > 8 it is 96% (22 out of 23). 
Figure [7] shows the comparison between the real data 
FOF and VDM catalogues. 

The mean GRP as a function of richness A^ is given 
in Figure [TTJ The blue solid line shows the mean GRPi 
taking into account all groups with > A^ . There is a slight 
and noisy rise from about 0.8 for A^ > 2 to 0.9 for A^ > 9 
due to the fact that the fraction of groups with GRPi 
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TABLE 3 

Group catalogue (excerpt) 



Group-ID 




(a) 


fdee:) 


(z) 


(km /s) 


(Mn) 


GRPi'^ 





10 


150.0087 


2.0287 


0.0788 


409 


8.20el2 


0.9 


1 


7 


149.4817 


2.5073 


0.0919 


393 


9.58el2 


1 


2 


20 


150.3004 


2.4489 


0.1231 


442 


6.09el3 


0.95 


3 


6 


150.3444 


2.1544 


0.1222 





1.13el3 


1 


4 


14 


149.8568 


1.8151 


0.1243 


532 


1.92el3 


0.93 


5 


6 


149.7238 


2.399 


0.1252 





1.23el3 


1 


6 


6 


150.2824 


2.1531 


0.1686 


69 


5.98el2 


0.5 


7 


9 


150.078 


2.2136 


0.1865 


242 


2.18el3 


1 


8 


12 


150.1122 


2.3564 


0.2208 


596 


2.71el3 


0.92 


9 


10 


150.4526 


2.6799 


0.2179 


768 


2.83el3 


1 


10 


6 


150.2371 


1.9404 


0.2188 


238 


1.13el3 


1 


11 


6 


150.2304 


2.5608 


0.2207 


530 


2.29el3 





12 


6 


150.1636 


2.0342 


0.2208 


355 


1.35el3 


1 


13 


8 


150.2494 


2.6574 


0.2672 


403 


3.85el3 


1 


Note. — 


The full table is available in 


electronic form. 







" Observed richness 

^ Velocity dispersion (see ii 14.31 1 

Virial mass of the DM halo (see § 14.41 1 
Group purity parameter (GPRi) (see [ 



TABLE 4 
Group galaxies (excerpt) 



TABLE 5 

Number of groups 



> w 


N a 

JVgr 


/(GRPi > 0)b 


< GRPi > 


2 


800 


0.82 


0.80 


3 


286 


0.86 


0.81 


4 


150 


0.95 


0.88 


5 


102 


0.95 


0.88 


6 


59 


0.95 


0.87 


7 


36 


0.94 


0.86 


8 


23 


0.96 


0.88 


9 


17 


1.00 


0.93 



Number of groups 
^ Fraction of groups with a group purity param- 
eter (GPRi) larger than zero. 



equals zero is slightly bigger for smaller N. On the other 
hand, the dashed blue line shows the mean GRPi taking 
into account only groups with a non zero GRPi, i.e. only 
groups simultaneously found by both groupfinder. For 
these groups, the GRPi is slightly decreasing since for 
bigger groups it becomes easier for the two groupfinders 



Galaxy-ID 


Group-ID 


TV 


a 


5 


z 


GAP'' 








(deg) 


(deg) 






818787 





10 


150.0605 


2.0067 


0.0785 


2 


818888 





10 


150.0365 


2.0249 


0.0794 


2 


818934 





10 


150.0241 


1.9687 


0.0779 


2 


818935 





10 


150.0239 


2.0727 


0.0779 


2 


818982 





10 


150.0134 


2.0296 


0.0791 


2 


819035 





10 


149.9989 


1.9858 


0.0805 





819041 





10 


149.9984 


2.0351 


0.0789 


2 


819060 





10 


149.9912 


1.9912 


0.0797 


2 


819118 





10 


149.9724 


2.1054 


0.0781 


2 


819133 





10 


149.9681 


2.0673 


0.0779 


2 


842033 


1 


7 


149.4897 


2.5164 


0.0913 


2 


842048 


1 


7 


149.4844 


2.4991 


0.0907 


2 


842049 


1 


7 


149.4839 


2.5211 


0.0915 


2 



0.6 




- -m- GRP^, only groups with GRPj>0 

grp^ 

GRP^, only groups with GRPj>0 



5 6 
>N 



Note. — The full table is available in electronic form. 
Galaxy purity parameter (see § 13. 3.21 1 



Fig. 11. — Mean GRP as a function of observed richness A'^. The 
blue solid line shows the GRPi and the red solid line the GRP2. 
The dashed lines show the corresponding GRPi, i = 1, 2, by taking 
into account only groups with a non zero GRPi. 

to disagree on one or two galaxies in the outskirts of 
the group. The red lines in Figure [TT] show the same 
quantities for GRP2- 

4.2. Corrected richness Ncorv 

The distribution of FOF groups as a function of red- 
shift for three richness classes N is shown in Figure \T% 
Comparing the black histograms (groups) with the red 
dashed lines (all galaxies) it is clear that the number 
of groups at a given redshift scales with the number of 
galaxies at the same redshift. This is basically true for 
all richness classes although for the richest N > 8 there 
is a lack of groups at redshifts z > 0.5. In the framework 
of the hierarchical cold dark matter (CDM) structure 
formation scenario we expect the cluster mass function 
to grow with time (for a review see Voit 2005). This 
growth should be reflected in the decrease of the number 
of groups of a given richness with redshift. 

In order to address this question, it is necessary to 
correct the observed richness of a cluster to produce an 
intrinsic richness that is redshift-independent. We there- 
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Fig. 12. — Number of groups as a function of redshift for different 
richness classes A'^. The top panel shows the number of groups A^gr 
for groups with N > 2, the middle panel for N > 5, and the bottom 
panel for N > S. The red dotted line shows the number of galaxies 
^gal fof the galaxy sample scaled down for comparison with the 
groups. It is obvious that the distribution of groups follows the 
distribution of galaxies. 

fore introduce the corrected richness A^com correcting the 
observed richness N for spatial samphng rate and red- 
shift success rate, and considering for each group only 
the number of members brighter than a given absolute 
magnitude limit Mb,iim(^), i.e for each group 



-^Vcorr(Mb,lim(2)) = 



1 



1 



(23) 



where the sum is over the members of the group with 
Mb < M;,jim(z), and Ca5,i and Crsr,i are the sampling 
rate and the redshift success rate, respectively, for the 
galaxy i. The redshift dependence of the absolute mag- 
nitude limit is always taken to be Mi, \iai{z) = M{, ii^ — z, 
whereas the subtraction of the redshift is to account ap- 
proximately for the luminosity evolution of the galaxies. 
So iVcorr can simply be characterized by Mb^um being the 
absolute magnitude limit at redshift zero. Absolute mag- 
nitudes were obtained by means of standard multicolor 
spectral energy distribution (SED) fitting using an upt- 




FlG. 13. — Correlation between Afcorr(— 20) and Mfu^go (see ^ I4.4| I 
for the 10k groups. Shown are all groups region having a redshift 
z < 0.8, so that the sample is volume limited for umC-^) = 
—20 — z. The solid line is a linear regression through the points, 
and the dashed line is the same quantity for the reconstructed 
groups in the mocks not shown here. The dotted line exhibits the 
linear regression for the Aftruc(— 20)- Af -relation for the real groups 
in the mocks. Taking into account the overestimation of Neon of 
about 50% — 100% (see the text), the dotted curve can be reconciled 
with the solid one. 

dated version of the ZEBRA code (Feldmann et al. 2006, 
P. Oesch et al. 2009 in preparation). 

If we denote the actual number of group mem- 
bers brighter than My,^ii^(z) in a real group in the 
mocks (without the spatial or redshift sampling rates) 
by iVtrue(Afb,iim), then we find that, for reconstructed 
groups exhibiting a two-way-match to real groups, the 
estimated iVcorr(— 20) exhibit a relatively large scatter 
(±50%) compared with A^truc(— 20) of the corresponding 
real groups. Furthermore, A'corrC— 20) on average over- 
estimates Aftrue(— 20) by about 50 — 100% depending on 
TV. This is because (1) for most groups we are in the low 
number regime, (2) the sampling rate in the 10k sam- 
ple is rather low (so the corrections are big and noisy), 
(3) groups with no galaxy brighter than Afb,iiin(z) can- 
not be corrected for sampling rate at all, and (4) the 
reconstructed groups are affected by interlopers. One 
should therefore be cautious in interpreting A^corr as the 
actual richness of the groups. Nevertheless, iVcorr(— 20) 
shows a relatively tight correlation with the estimated 
halo mass Affudgc (see S I4.4|) . even for > 2, as is shown 
in Figure [13] and thus is still a useful quantity. The anal- 
ysis of the redshift distribution for groups with a given 
A'corr(— 20) is performed in ij 15.21 

4.3. Velocity dispersion estimation 

The corrected richness A'^corr discussed in the last sec- 
tion is probably the simplest and most straightforward 
characterization of a group. However, there are other 
characterizations of groups which may be more directly 
useful from a physical point of view such as velocity 
dispersion a or dynamical mass M. Since most of our 
groups have richness A'^ < 10, we are in a low number 
regime, where the estimation of both velocity dispersion 
and mass is non-trivial. 

According to Beers, Flynn & Gebhardt (1990) the 
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best estimators for velocity dispersion in groups with 
few members are the gapper estimator and the simple 
standard deviation. On the other hand, the biweight 
estimator seems to work very well on a large range of 
richness classes N except for < 20 where its perfor- 
mance is lower but still sufficient. For comparison, we 
have implemented all three estimators and none of them 
is significantly superior to the others when applied to the 
mocks, and so we will stick to the gapper estimator since 
it is the most commonly used among the three. 

The implementation for a group with N members is as 
follows: First of all, for each group member i we com- 
puted the redshift difference dzi in respect to the mean 
group redshift Zgi-. Then these redshift differences were 
converted into velocities by 



dvi 



C dZi/{l + Zgr) 



(24) 



with c the speed of light. Then after sorting the velocities 
dvi in ascending order, the gapper estimate is given by 



N-1 



WiQi 



(25) 



- NiN - 1) ^ 
whereas the weights Wi and the gaps gi are defined by 



5i 



:i(iV- 1) 

: dv,+i - dvi 



(26) 
(27) 



for i — 1 , . . . , — 1 . However in order to have a realistic 
estimate of the velocity dispersion of our group we have 
to correct (jgap for our redshift uncertainty Cv of roughly 
100 km s~^. This is done by 



(28) 



where & is the final estimate of the velocity dispersion a. 
The factor -\/3 converts the line of sight velocity disper- 
sion to the 3D velocity dispersion. If is larger than 
Cgap we set a formally to zero. 

Since the COSMOS lightcones (Kitzbichler & White 
2007) provide only the "virial velocity" Wvir of the DM 
halos and not directly the "velocity dispersion" cr, we 
cannot precisely estimate the uncertainty of the esti- 
mated u for a group. But comparing tr to Uvir should 
provide an upper limit to the uncertainty. To take into 
account the infiuence of interlopers on ct, we considered 
the estimated velocity dispersion of reconstructed groups 
exhibiting a two-way-match with real groups. (Wrongly 
detected groups do not exhibit a meaningful velocity dis- 
persion.) 

We find that for > 5 the ratio between the median 
virial velocity Wvir and & remains roughly constant for 
(T > 350 km s^^, and exhibits an error of about 25% 
(upper and lower quartile) (Figure [T4l) . Note that the 
estimated a do not need to fall exactly on the 45°-line, 
since a and Vvir are not exactly the same quantities. For 
a < 350 km s~^ the estimated velocity dispersion a is 
biased to lower values due to the subtraction in equation 

■^^ In the COSMOS lightcones, the virial velocity is simply de- 
fined by Dvir = -^Z GM200 /''200 1 whereas G is the gravitational con- 
stant, and M200 and r200 are the virial mass and the virial radius, 
respectively, related by M200 = 4/37rr|QQ200pc (z) with Pc(z) the 
critical density of the universe at the redshift of the halo. 
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Fig. 14. — Correlation between the estimated velocity dispersions 
(J of groups with N > 5 and the virial velocities «vir of the DM halos 
. Each point displays a reconstructed group exhibiting a two-way- 
match to a real group whose DM halo yields i)vir- It is obvious that 
for $ 350 the estimated velocity dispersion is underestimating 
the virial velocity. 



(|28|). On the other hand, for < 5, the correlation be- 
tween (7 and Wvir is very weak, so that the a for these 
richness classes contains almost no information. Hence, 
we have decided to assign no estimated velocity disper- 
sion to groups with A^ < 5 in Tabled 

Note that applying the velocity dispersion estimation 
to the real groups instead of the reconstructed groups 
does not significantly alter these results. Even estimat- 
ing the velocity dispersion for the real groups in the 10k 
mocks taking into account all galaxies down to r < 26 
still yields a scatter of about 10 to 15%. 

4.4. Estimation of dynamical mass 

Estimating the dynamical mass of the underlying dark 
matter halo of a group is even more difficult than esti- 
mating the velocity dispersion. The simplest method for 
the estimation of dynamical mass is by using some form 
of the virial theorem. The standard relation is (e.g. Eke 
et al. 2004) 



M = A 



a r± 



(29) 



where A is a constant depending on the mass distribution 
of the halo (e.g. geometry, concentration, etc.), a the es- 
timated velocity dispersion, and r± is some estimate of 
its projected radius. Heisler, Tremaine & Bahcall (1985) 
discuss four simple mass estimators, each being only a 
function of the projected distances and radial velocities 
of the group galaxies in respect to the group center. In 
applying them to the reconstructed groups in the mocks, 
none of them works substantially better than the sim- 
ple relation in equation (j29p and all show a similar be- 
haviour, so we consider only the standard virial theorem. 

To use the estimator in Equation ((29|). the constant of 
proportionality A needs to be calibrated properly. Doing 
this with the mocks and using an appropriate estimation 
for the projected radius, we find a similar behaviour for 
the estimated masses like for the velocity dispersion. For 
N < b there is only a very weak correlation between the 
estimated mass and the actual mass of the underlying 
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Fig. 15. — Correlation between the "fudge mass" Mfmjgo of 
groups with N > 5 and the virial mass Afhaio of the corresponding 
DM halos. Each point displays a reconstructed group exhibiting 
a two-way-match to a real group to whose DM halo mass it is 
compared. 

DM halo. For N > 5 there is a correlation, but only for 

M > 5 X 10^^ Mq, and with an error (upper and lower 
quartiles) of roughly a factor of 2 with respect to the me- 
dian. Since these mass estimates would be of relatively 
limited use and since the mocks are needed for calibra- 
tion anyway, we have instead pursued another approach: 
It turns out that using observed richness N, corrected 
for sampling and redshift success rate, and redshift z as 
proxy for the mass and calibrating them with the mocks 
works rather well. The mass for a group with observed 
richness N and redshift z is then simply given by 

Mfudgo-(Afhalo(^,^)), (30) 

where Mhaio(-^, z) denotes the mass of a halo at redshift 
z containing TV galaxies, N is the observed richness of the 
group corrected for sampling and redshift success rate, 
and the angle brackets denote the average over the halos 
in the 24 mocks. We will denote this mass as "fudge 
mass" to indicate that it is calibrated with the mocks. 

The fudge masses of the reconstructed groups with 
N > 5 exhibiting a two-way-match to their real groups 
are shown in Figure 1151 In contrast to the velocity 
dispersion estimates, there is no bias for small masses. 
The error for the masses (upper and lower quartiles) is 
about 50%. Furthermore, the masses are also defined 
for small groups, whereas the upper quartile increases 
toward 100% for N = 2. The lower quartile does not 
change significantly. 

4.5. Manual intervention: The example of a 
super-group at z ~ 0.22 

After applying our group-finding procedure to the real 
10k sample, we encountered a huge structure which re- 
sisted to yield even roughly consistent results between 
FOF and VDM. We mention this particular case, be- 
cause it illustrates some of the weaknesses of our adopted 
group-finder, and finally required a special treatment. It 
was already mentioned by Finoguenov et al. (2007) and 



is probably an example of a "super-group" , where sev- 
eral smaller groups are just about to merge (Smolcic et 
al. 2007). The redshift of the system is about z ~ 0.22. 
An example of such a super-group in another field at 
z = 0.37 is given by Gonzalez et al. (2005), Kautsch et 
al. (2008). 

The projected galaxy distribution of this structure is 
exhibited in Figure 1161 The upper left panel shows the 
group assignment of the multi-run FOF catalogue, and 
the upper left panel the group assignment of the multi- 
run VDM. Each group is denoted by a symbol (e.g. 
square, triangle) of a particular color, and field galaxies 
by black points. This example of the super-group gives 
us some interesting insights concerning the group-finding 
procedure. 

This extended structure exhibits the main potential 
problems of both the FOF and VDM algorithms. The 
FOF algorithm connected practically all the galaxies in 
this super-group, without distinguishing between differ- 
ent sub-groups. This behaviour is well known for FOF, 
and it happens in particularly dense regions such as this. 
The problem is that any single galaxy between two of 
these sub-clusters will act as a bridge for the FOF al- 
gorithm to connect the two clusters. The VDM is more 
successful in distinguishing different sub-structures, but 
nevertheless fails to do a perfect job. A casual glance sug- 
gests that the "green square" VDM cluster in fact con- 
sists of two independent sub-groups (consistent with the 
X-ray contours). Furthermore, the "red triangle" VDM 
group exhibits two outliers to the South which almost 
certainly do not belong to this group. The occurrence 
of such outliers is not uncommon in VDM groups. It is 
related to the fact that in the VDM groupfinder every 
second order Delaunay-neighbour in the second cylinder 
is accepted as group member and that the second cylin- 
der is usually much bigger than the third cylinder (see 

§[SI11). 

Since we have accepted the FOF catalogue as the ba- 
sis catalogue, and since this super-group is the only case 
where FOF is in such obvious disagreement with VDM, 
we decided to just correct this single structure manually 
after visual inspection. The final result is shown in the 
lower panel of Figure 1161 The manually created groups 
by this intervention are group #8 and #795 — #799. Al- 
though the manual assignment of galaxies to sub-groups 
is somewhat arbitrary, looking in redshift space (and not 
only at the projected galaxy distribution) the eye recog- 
nizes quite well different sub-structures. This example 
emphasizes how important precise redshift information 
is. Even if it had been possible to recognize this huge 
overdensity with precise photo-z, it would not be possi- 
ble to disentangle the more subtle sub-structure reliably. 
This is illustrated best by the "blue upward triangle" 
and the "brown downward triangle" groups overlapping 
in projection (see lower panel of Figure . 

5. COMPARISON WITH THE MOCKS AND 2DFGRS 

In this section, we will compare our group catalogue 
with the mocks. Since zCOSMOS was designed to have 
a similar survey design as the 2 Degree Field Galaxy Red- 
shift survey (2dfGRS, Colless et al. 2001) (see Lilly et al. 
2007), albeit at higher redshifts, we also want to com- 
pare our group catalogue to the 2PIGG group catalogue 
(Eke et al. 2004) of 2dfGRS to have a reference point 
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Fig. 16. — Group #8 in the uncorrected FOF group catalogue has been manually split up into several groups (#8, #795 — #799), because 
the FOF as well as the VDM method failed (see the text). Upper left panel shows the initial FOF groups, the upper right panel the initial 
VDM groups, and the lower panel the FOF groups after manual intervention. Black points denote field galaxies, and the other symbols 
(squares, triangles, etc.) are group galaxies, whereas each group has its own symbol and color. The blue contours exhibit the X-ray emission 
of the super-group as observed with XMM-Newton (Finoguenov et al. 2007, 2009 in preparation). 



in the local universe. The 2PIGG catalogue is particu- 
larly appropriate for comparison since it was obtained by 
essentially the same FOF algorithm we have adopted. 

5.1. Number of groups as a function of N 

The most straightforward way to compare the real data 
with the mocks is to compare the number of 10k groups 
with the number of reconstructed groups in the mocks as 
a function of observed richness N . This is shown in Fig- 
ure [ITl The number of 10k groups is in good agreement 
with the number of reconstructed groups in the mocks. 
It seems that for iV < 6 there is a slight excess of 10k 
groups relative to the reconstructed groups. However 
this excess is not significant. 

More significant is the fact that for > 20 there are 
significantly more groups in the mocks than in the real 
data. While in the 10k sample there is one group in this 
range (group #2 with N = 20)^^, the median number of 
reconstructed groups with iV > 20 in a 10k mock is 5 with 

^® Even if we regarded the super-group in ii l4.5l as a single group 



upper and lower quartiles of 6 and 3, respectively. These 
groups are distributed in the redshift range 0.1 ^ z < 0.7. 
Even groups with iV > 50 should not be exceptional — 
on average there are ~ 0.5 of these per mock. 

These huge groups are not an artefact of our recon- 
structed groups, but are also present in the real group 
catalogue. More than 80% of the reconstructed groups 
with TV > 20 exhibit a two-way-match to a real group, as 
one would expect from Figure [HI and their mean GRPi 
is about 0.9. So we conclude that the lack of big groups 
in the 10k sample is probably real and not due to some 
problem with the groupfinder. 

5.2. Fraction of galaxies in groups 

A quantity that is closely related to the number of 
groups in a catalogue is the fraction of galaxies in the 
sample that are placed in groups. This fraction is shown 
as a function of redshift in Figure [18] for N > 2 and 

with A'^ > 20, the 10k group catalogue would still contain far fewer 
big groups than in most of the mocks. 
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Fig. 17. — Number of groups A^gr as a function of observed rich- 
ness A'^. The upper panel shows the absolute number of groups, 
whereas the red lines correspond to the 10k groups, and the blue 
lines to the reconstructed groups in the mocks. Solid lines corre- 
spond to FOF groups, and dashed lines to the IWM groups. The 
error bars correspond to the real scatter among the 24 mocks. The 
gray shaded area exhibits the cosmic variance of real groups in the 
mocks among the 24 mocks. The lower panel shows the abundance 
of groups relative to the reconstructed FOF groups, whereas all 
symbols are the same as in the upper panel. 
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Fig. 18. — Fraction of galaxies in groups in the central region. 
The red solid line shows the fraction of galaxies in groups for the 
10k catalogue. The black dashed line shows the mean fraction of 
galaxies in real groups in the mocks and the green dashed line the 
mean fraction of galaxies in reconstructed groups. If only galaxies 
in groups with N < 20 are considered as group galaxies, the results 
are the solid black and green line, respectively. The errorbars show 
the scatter among the 24 mocks. 



> 5 in the central region. If the galaxies in the mocks 
associated to groups with > 20 are not considered 
as group galaxies (solid blue and black line) — since we 
have shown that there are far too many of these groups 
in the mocks — the overall behaviour of the fractions in 
the 10k sample (red line) match quite well those of the 
reconstructed groups or real groups, at least in the red- 
shift range 0.1<z<0.5. If the galaxies in groups with 
iV > 20 are treated as normal group galaxies (dashed 
blue and black line), the fraction of group galaxies in the 
mocks is too high, as expected. 

Noticeable is the lack of group galaxies in the 10k sam- 
ple at redshift z ^ 0.55 and the excess at redshift z ^ 0.9, 
especially for N > 2. The lack of group galaxies at 
2 ^ 0.55 coincides with a big underdensity in the 10k 
galaxy sample and is clearly visible in Figure 1121 The 
origin of the excess of group galaxies at high redshift is 
less clear. While there are single mocks with a devia- 
tion at z ~ 0.55 similar to the lack of group galaxies in 
the 10k sample, none of the mocks approach the excess of 
group galaxies at redshift z ~ 0.9. These underdense and 
over dense regions are clearly seen in the overall galaxy 
density field constructed by Kovac et al. (2009a). 

The decline of group galaxies with redshift has two 
reasons: First of all, there are fewer groups at high red- 
shift, as expected from the hierarchical growth of struc- 
ture at these scales, and second, the fraction of detectable 
groups decreases with redshift, since the galaxy density 
decreases and so it becomes more and more improbable 



to observe two galaxies residing in the same DM halo. 
The second of these has been already discussed in § 12.31 
(see Figure^]), so we focus here mainly on the first reason 
which is demonstrated in Figure [191 Here the fraction of 
galaxies in groups is shown as a function of iVcorr(~20), 
and all galaxies samples are chosen to be volume lim- 
ited in respect to Afbjiin(2). The solid black line ex- 
hibits the fraction of 10k group galaxies in the redshift 
bin 0.2 < z < 0.5, and the dashed black line the fraction 
in the redshift bin 0.5 < z < 0.8, each time in the cen- 
tral region. The magenta and the cyan hatched regions 
show the regions enclosed by the upper and lower quar- 
tiles of the corresponding fractions in the mocks for the 
low and high redshift bin, respectively. The fractions in 
the mock are rather lower than in the 10k sample, es- 
pecially at low redshift, although there are single mocks 
which have fractions as high as that in the 10k sample 
and higher. The blue line is the fraction of galaxies in 
the 2dfGRS-2PGIGG groups (Eke et al. 2004) in the 
redshift range 0.03 < z< 0.13. 

Note that the plotted lines are relatively sensitive to 
the absolute magnitudes used to estimated iVcorr- If 
there are slight systematics in the estimation of the abso- 
lute magnitudes, the lines will be slightly too low or too 
high. The absolute magnitudes for the 2dfGRS galaxies 
were estimated using the fc-correction formula provided 
by Norberg et al. (2002). Furthermore, in order to ad- 
just the 2dfGRS selection effects and completeness (Col- 
less et al. 2001; Cross et al. 2004) as much as possible 
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Fig. 19. — Fraction of galaxies in groups with equal or more 
than A'corrC— 20) members in volume limited samples. The solid 
black line shows the fraction of 10k group galaxies in the redshift 
bin 0.2 < z < 0.5, and the dashed black line the fraction in the 
redshift bin 0.5 < z < 0.8, each time in the central region. The 
magenta and the cyan hatched regions display the regions enclosed 
by the upper and lower quartiles of the corresponding fractions in 
the mocks for the low and high redshift bin, respectively. The blue 
line displays the fraction for 2dfGRS-2PIGG groups in the redshift 
range 0.03 < z < 0.13, and the blue shaded region is the uncer- 
tainty given by ±0.2 error in the absolute magnitude estimation of 
the 2dfGRS galaxies. 

to those of zCOSMOS, some 2dfGRS galaxies were re- 
moved from the sample in a probabilistic way, and A'corr 
for 2PIGG groups were estimated in the same way as for 
10k groups. The blue shaded region in Figure [19] exhibits 
the uncertainty for 2PIGG line owing for some possible 
systematics with the absolute magnitude estimation cor- 
responding to ±0.2 mag. The effect from cosmic variance 
between NGP and SGP is much smaller than this. 

Figure [19] shows clearly a decline of the fraction of 
group galaxies with redshift. Since iVcorr correlates fairly 
well with mass (see Figure [T5|) . this decline can straight- 
forwardly be interpreted in terms of growth of structure 
as expected in a hierarchical structure formation scenario 
(Voit 2005). In fact, using Afhjim = —20 as absolute mag- 
nitude threshold to estimate A'corr our sample is volume 
limited up to z ~ 0.9, and we have checked with the 
mocks that we do not loose groups due to detectability 
problems in this sample. 

6. SUMMARY 

The aim of this paper was to create a group catalogue 
out of the spectroscopic zCOSMOS 10k sample, to enable 
investigations of the galaxy population in groups over 
the redshift range 0.1 < z < 1. The basic group- finding 
method was to use a FOF and a VDM groupfinder to 
identify galaxy overdensities in redshift space without 
regard to individual galaxy properties, using the precise 
a ~ 100 km s~^ velocities available from zCOSMOS. 

The performance of both FOF and VDM groupfinders 
was extensively tested using realistic mock spectroscopic 
samples generated from the COSMOS mock lightcones 
(Kitzbichler & White 2007), which reproduce the com- 
plex selection function of the actual spectroscopic sur- 
vey. During the extensive testing and comparing of these 
groupfinders, we have developed a new method which 



progressively optimizes the group-finding parameters for 
smaller and smaller groups as the catalogue is generated 
from the richest groups down to the poorest. This is 
found to optimize the group catalogue fidelity, in terms 
of completeness and purity, over a broad range of rich- 
nesses N. Using this new approach, we achieve an im- 
pressively high fidelity of our group catalogue compared 
with others in the literature at these redshifts. The stan- 
dard FOF algorithm yields a group catalogue with overall 
better statistics compared with those of the VDM algo- 
rithm, and we have chosen this for basic group catalogue. 
However, the purity of the group sample is significantly 
enhanced, at modest cost in completeness, if we also take 
the intersection of this FOF main catalogue with an in- 
dependent VDM group catalogue — producing the so 
called "one-way-matched" (IWM) sub-catalogue. 

With the aid of our mocks, we have a very good idea 
of the statistical properties of the group sample. We find 
that for FOF groups with N > 5 the completeness is 
85% and the purity 78%. For the IWM catalogue, the 
purity rises to 82%, while the completeness drops only 
to 81%. For poorer groups with N < 5 the statistics 
of purity and completeness are not substantially worse. 
These fidelity statistics are fairly stable over the whole 
redshift range. As would be expected, the completeness 
and "interloper fraction" statistics for group members 
are enhanced in the centers of groups. Furthermore, we 
find that, while the basic FOF catalogue slightly over- 
produces the number of groups with N < 5, the IWM 
sub-catalogue reproduces almost perfectly the number of 
real groups down to = 2. 

The actual zCOSMOS 10k FOF group catalogue con- 
tains 102 groups with TV > 5 and 23 groups with > 8. 
Going down to A = 2 yields a total of 800 groups. 
Groups with N > 5 have been assigned a velocity disper- 
sion (T and a mock calibrated dynamical mass M whose 
uncertainties are understood quite well. While for N < 5 
we could still assign a meaningful mass to the groups, a 
reasonable estimate of velocity dispersion is not possible. 
The fraction of 10k galaxies in groups is about 25% at 
low redshift and decreases toward ~ 15% at z ^ 0.8. 

Comparing the 10k group catalogue to the mocks yields 
fairly consistent results. The main discrepancies are that 
(1) there are many more groups with A?^ > 20 in the 
mocks compared to the 10k sample, and (2) the fraction 
of 10k galaxies in groups is significantly higher at 2; ^ 0.9 
than in the mocks. We find that the fraction of galax- 
ies in groups, for groups with a given corrected richness 
A^corr(— 20), decreases from redshift 0.1 to redshift 0.8. 
This can be interpreted in terms of growth of structure as 
expected in a hierarchical structure formation scenario. 

The properties of these groups are explored in a num- 
ber of companion papers. The environments of these 
groups in terms of the larger scale galaxy-density-field in 
which they are embedded is given in Kovac et al. (2009a). 
The evolution of the galaxy population in these groups 
is explored in A. lovino et al. (2009, in preparation) 
and K. Kovac et al. (2009b, in preparation), in terms of 
galaxy colors and galaxy morphologies respectively, and 
in Silverman et al. (2009) in the context of active galac- 
tic nuclei. In future studies, we will investigate the X-ray 
properties of our groups, and perform a weak lensing and 
a galaxy-group cross-correlation analysis. 
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